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Abstract 

This paper introduces U -relations, a succinct and purely 
relational representation system for uncertain databases. 
U-relations support attribute-level uncertainty using verti- 
cal partitioning. If we consider positive relational algebra 
extended by an operation for computing possible answers, 
a query on the logical level can be translated into, and 
evaluated as, a single relational algebra query on the U- 
relation representation. The translation scheme essentially 
preserves the size of the query in terms of number of oper- 
ations and, in particular, number of joins. Standard tech- 
niques employed in off-the-shelf relational database man- 
agement systems are effective for optimizing and processing 
queries on U-relations. In our experiments we show that 
query evaluation on U-relations scales to large amounts of 
data with high degrees of uncertainty. 

1 Introduction 

Several recent works (TO] [9] [8] |2] [14] |4] [6) aim at devel- 
oping scalable representation systems and query processing 
techniques for large collections of uncertain data as they 
arise in data cleaning, Web data management, and scientific 
databases. Most of them are based on a possible worlds 
semantics, and for all of them such a semantics can be con- 
veniently defined. 

Four desiderata for representation systems for incom- 
plete information appear important. 

1. Expressiveness. The representation should be closed 
under the application of (relational algebra) queries and data 
cleaning algorithms (which remove some possible worlds). 
That is, the results of such operations to the represented data 
should be again representable within the formalism. 

2. Succinctness. It should be possible to represent large 
sets of alternative worlds using fairly little space. 

3. Efficient query evaluation. A trade-off is required be- 
tween the succinctness of a representation formalism and 
the complexity of evaluating interesting queries. This trade- 
off follows from established theoretical results ifTl 1111 l6l. 



However, while the formalisms in the literature tend to dif- 
fer in succinctness, several have polynomial-time data com- 
plexity for (decision) problems such as tuple possibility un- 
der positive (but not full) relational algebra. This includes v- 
tables IfTZirrp . uncertainty-lineage databases (ULDBs) |8|, 
and world-set decompositions (WSDs) |5). 

4. Ease of use for developers and researchers in the sense 
that the representation system can be easily put on top of a 
relational DBMS. This in particular includes that queries on 
the logical schema level can be translated down to, ideally, 
relational algebra queries on the representation relations and 
that this translation is simple and easy to implement. This 
goal is motivated by the availability and maturity of existing 
relational database technology. 

An important aspect of a representation system is 
whether it represents uncertainty at the attribute -lev el or the 
tuple-level. Attribute-level representation refers to the suc- 
cinct representation of relations in which two or more fields 
of the same tuple can independently take alternative values 
(see also (6j). Attribute-level representation of uncertainty 
(as supported by c-tables [ 12 1 and WSDs) offers finer gran- 
ularity of independence than tuple-level approaches such as 
||8] [TO] [2] . This is useful in applications like data cleaning 
in which the values of several fields of a single tuple can 
be independently uncertain. For instance, the U.S. Census 
Bureau maintains relations with dozens of columns (> 50), 
most of which may require cleaning |4|. 

U-relations. In this paper, we develop and study U-rela- 
tions, a representation system that we introduce with the 
following example. 

Example 1.1. Let us assume that an aerial photograph of 
a battlefield shows four vehicles at distinct positions 1 to 4. 
The resolution of the image does not allow for the identifi- 
cation of vehicle types, but we can draw certain conclusions 
from earlier reconnaissance and a calculation of the maxi- 
mum distance each vehicle may have covered since. Say we 
know that vehicle 1 is (a) a friendly tank. Vehicles 2 and 3 
are (b) a friendly transport and (c) an enemy tank, but we 
do not know which one is which. Nothing is known about 
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(a) (b) 
Figure 1. Map with moving vehicles (a) and U-relational database representation of the possible 
worlds at the time the aerial photograph detecting vehicles 1,2,3,4 was taken (b). 



vehicle 4. Figure[]J shows a schematic drawing of how this 
scenario can arise. Only 1 is in the range of (a); 2 and 3 are 
in the ranges of (b) and (c); and position 4 is near the border 
of the photograph but outside the ranges of (a), (b), and (c), 
so this vehicle must have newly moved onto the map. 

We want to model this by an uncertain database of 
schema R(ld, Coord, Type, Faction), representing the ids 
(1—4), coordinate positions, types, and factions of the vehi- 
cles on the map. Let us assume there are only two vehicle 
types (tank or transport) and two factions (friend or enemy). 
Then there are eight possible worlds. We obtain one by tak- 
ing three choices - answering the following questions: Has 
the friendly transport (b) now become vehicle 2 (x h-> 1) or 3 
(x h-> 2)1 Is vehicle 4 a tank (y h 1) or a transport (y h-> 2)1 
Is vehicle 4 friendly (zh 1) or an enemy (z i-» 2)1 Thus the 
uncertainty can be naturally modelled using three variables 
x, y, z that each can independently take one of two values. 

We model this scenario by the U-relational database 
shown in Figure QJ). We use vertical partitioning (cf. e.g. 
\l5l ) to achieve attribute-level representation. R is rep- 
resented using four U-relations, one for each column of R. 
The U-relation for the coordinate positions (which are all 
certain) is not shown since we do not want to use it sub- 
sequently, but of course, conceptually, coordinate positions 
are an important feature of the example and have to be part 
of the schema. In addition there is a relation W which de- 
fines the possible values the three variables can take. 

We can compute a vertical decomposition of one world 
given by a valuation 9 of the variables x, y, z by (*) removing 
all the tuples from the U-relations whose D columns contain 
assignments that are inconsistent with (For example, if 
9 - [x h-> 1, y h-> l,z h 1} then we remove the third and 
fifth tuples of U\ and the fifth tuples of U2 and t/3.) and 
then (*) projecting the D columns away. Of course we can 
resolve the vertical partitioning by joining the decomposed 
relations on the tuple id columns Tr. a 

U-relations have the following properties: 

• Expressiveness: U-relations are complete for finite sets 
of possible worlds, that is, they allow for the representa- 
tion of any finite world-set. 

• Succinctness: U-relations represent uncertainty on the 
attribute level. Even though they allow for more efficient 



query evaluation, U-relations are, as we show, exponen- 
tially more succinct than ULDBs and WSDs. That is, 
there are (relevant) world-sets that necessarily take ex- 
ponentially more space to represent by ULDBs or WSDs 
than by U-relations. 

• Leveraging RDBMS technology: U-relations allow for 
a large class of queries (positive relational algebra ex- 
tended by the operation "possible") to be processed us- 
ing relational algebra only, and thus efficiently in the 
size of the data. Our approach is the first so far to achieve 
this for the above-named query language. Indeed, this 
not only settles that there is a succinct and complete 
attribute-level representation for which the so-called tu- 
ple Q-possibility problem for positive relational algebra 
is in polynomial time (previously open J6j) but puts a 
rich body of research results and technology at our dis- 
posal for building uncertain database systems. 

This makes U-relations the most efficient and scalable 
approach to managing uncertain databases to date. 

• Parsimonious translation: The translation from rela- 
tional algebra expressions on the logical schema level to 
query plans on the physical representations replaces a se- 
lection by a selection, a projection by a projection, a join 
by a join (however, with a more intricate join condition), 
and a "possible" operation by a projection. We have ob- 
served that state-of-the-art RDBMS do well at finding 
efficient query plans for such physical-level queries. 

Ease of use: A main strength of U-relations is their sim- 
plicity and low "cost of ownership": 

• The representation system is purely relational and in 
close analogy with relational representation schemes for 
vertically decomposed data. Apart from the column 
store relations that represent the actual data, there is only 
a single auxiliary relation W (which we need for comput- 
ing certain answers, but not for possible answers). 

• Query evaluation can be fully expressed in relational al- 
gebra. The translation is quite simple and can even be 
done by hand, at least for moderately-sized queries. 

• The query plans obtained by our translation scheme are 
usually handled well by the query optimizers of off-the- 
shelf relational DBMS, so the implementation of special 
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operators and optimizer extensions is not strictly needed 
for acceptable performance. 

Thus U-relations are not only suited as a representa- 
tion system for dedicated uncertain database implementa- 
tions such as MayBMS [4|, but are also relevant to "casual 
users" of representation systems for uncertain data, such as 
researchers in data cleaning and data integration who want 
to store and query uncertain data without great effort. 

Apart from those implicitly mentioned above, we make 
the following further contributions in this paper. 

• We study algebraic query optimization and present 
equivalences that hold on vertically decomposed repre- 
sentations. We address query optimization using them in 
the context of managing uncertainty with U-relations. 

• We present an algorithm for normalizing a U-relational 
representation obtained from a query. Normalized U- 
relational databases yield a conceptually simple algo- 
rithm for computing the certain answers of queries. In 
particular, certain answer tuples on normalized tuple- 
level representations can be computed using relational 
algebra only, which is not true in general for previous 
representation systems. 

• We provide experimental evidence for the efficiency and 
relevance of our approach. 

The structure of the paper is as follows. Section [2] es- 
tablishes U-relations formally. Section [3] presents our re- 
duction from queries on the logical level to relational al- 
gebra on the level of U-relations and addresses algebraic 
query evaluation. Section [4] presents the normalization al- 
gorithm. Section [5] discusses the relationship between U- 
relations, WSDs and ULDBs and argues that U-relations 
combine the advantages of the other two formalisms with- 
out sharing their drawbacks. In Section|6l we report on our 
experiments with U-relations. We conclude with Section|7] 

2 U-relational databases 

We define world-sets in close analogy to the case of c- 
tables [12]. Consider a finite set of variables over finite do- 
mains. A possible world is represented by a total valuation 
(or assignment) / :Var i-> Rng of variables to constants in 
their domains, and the world-set is represented by the finite 
set of all total valuation^ We represent relationally the 
variable set and the associated domains by a world-table 
over schema W(Var,Rng) such that W consists of all pairs 
(x, v) of variables x and values v in the domain of x. 

Example 2.1. The world-table W in FigureQ]defines three 
variables x, y, z, whose common domain is { 1 , 2). The num- 
ber of worlds defined by W is 2-2-2 = 8. □ 

'This is a generalization of world-set decompositions of (3J, where 
component ids are variables and local world ids are domain values. 



Given a world-table W, a world-set descriptor over W, or 
ws-descriptor for short, is a valuation d such that its graph is 
a subset of W. If d is a total valuation, then it represents one 
world. In our examples, to represent the entire world-set we 
use an empty ws-descriptor, as a shortcut for a singleton ws- 
descriptor with a new variable with a singleton domain. 

We are now ready to define databases of U-relations. 

Definition 2.2. A U-relational database for a world-set 
over schema £ = (/?i[Ai], . . . ,Rk[Ak\) is a tuple 

{U U i,...,U Umi ,...,U u ,...,U Kmk ,W), 

where W is a world-table and each relation I/y has schema 
Uij[Djj; Tr : ; B{J] such that Dij defines ws-descriptors over 
W, Tr { defines tuple ids, and B^\ U • • • U Bi mi = A,-. 

A ws-descriptor {c\ i-» Z],...,q i-» 4} is relationally 
encoded in n-^ (UjJ) of arity n > A: as a tuple (cj i— > 
/i,..., c* i-» l k , q +i h-> l k+u . . . , c n i-» l n ), where each 
c; i-» /, is a cj i-> Ij for any j and all i with 1 < j < k < i < n. 

Although we speak of vertical partitioning, we do not 
require the value columns of Uy to disjointly partition the 
columns of /?,-. Indeed, overlap may be useful to speed up 
query evaluation, see e.g. fl5l . 

We next define the semantics of a U-relational database. 
To obtain a possible world we first choose a total valuation 
/ over W. We then process the U-relations tuple by tuple. 
If the function / extend^ the ws-descriptor d of a tuple of 
the form (d, 1, a) from a U-relation of schema (D, T, A), we 
insert in that world the values a into the A-fields of the tuple 
with identifier f . In general this may leave some tuples par- 
tial in the end (i.e., the values for some fields have not been 
provided.) These tuples are removed from the world. 

We require, for a U-relational database (U\, . . . , U n , W) 
to be considered valid, that the representation does not pro- 
vide several contradictory values for a tuple field in the 
same world. Formally, we require, for all 1 < i, j < n, and 
tuples fi € Ui[Di,Ti,Ai] and t 2 e Uj[Dj,Tj,Aj] such that 
Uj and Uj are vertical partitions of the same relation, that if 
there is a world that extends both t\.Df and t2.Dj, then for 
all A E (A; n A 1 1 A = t%.A must hold. 

Example 2.3. Suppose there are two U-relations with 
schemata U\{p[; T R ;A, B] and U 2 \jh.\ T R ; B, C] that jointly 
represent columns A, B, and C of a relation R. Assume tu- 
ples (ci, l,t\,a,b) 6 U\ and (q, 2, t\, b', c) e Ui- Then U\ 
and Ui cannot form part of a valid U-relational database be- 
cause there would be a world with c\ i-» 1, c 2 i-» 2 in which 
the tuple from U\ requires field t\.B to take value b while 
the tuple from U2 requires the same field to take value V . □ 

A salient property of U-relational databases is that they 
form a complete representation system for finite world-sets. 

Theorem 2.4. Any finite set of worlds can be represented 
as a U-relational database. 

2 That is, for all .v on which d is defined, d(x) = f(x). 
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3 Query Processing 

The semantics of a query gona world-set is to evaluate 
Q in each world. For complete representation systems like 
U-relational databases, there is an equivalent, more efficient 
approach [1121 : Translate Q into a query Q such that the 
evaluation of Q on a U-relational encoding of the world-set 
produces the U-relational encoding of the answer to Q. 

Queries on vertical decompositions. U-relations rely es- 
sentially on vertical decomposition for succinct (attribute- 
level) representation of uncertainty. To evaluate a query, we 
first need to reconstruct relations from vertical decomposi- 
tions by (1) joining two partitions on the common tuple id 
attributes and (2) discarding the combinations that yield in- 
consistent ws-descriptors. We call this operation merge and 
give its precise definition in Figure [4] where the two above 
conditions are defined by a and ifr, respectively. 

Example 3.1. Consider the U-relational database of Fig- 
ure Q] The query cr Faction= , Enemy , AType= , Tank ,(#) lists the en- 
emy tanks on the map. To answer this query, we need to 
merge the necessary partitions of R and obtain a new query 
with me rge(7ip aa i on (R), ^peW) in the place of R. a 

Our query evaluation approach can take full advantage 
of query evaluation and optimization techniques on vertical 
partitions. First, it does not require to reconstruct the entire 
relations involved in the query, but rather only the necessary 
vertical partitions. Second, necessary partitions can be flex- 
ibly merged in during query evaluation. Thus early and late 
tuple materialization 1 15 1 carry over naturally to our frame- 
work. For this, our merge operator allows to merge two 
partitions not only if they are given in their original form, 
but also if they have been modified by queries. 

The first advantage only holds for so-called reduced U- 
relational databases, which do not have tuples that cannot 
be completed in any world. That is, each tuple of a reduced 
U-relation can always be completed to an actual tuple in a 
world. The advantage becomes evident even for a simple 
projection query. Consider a reduced database containing 
a U-relation U defining the A attribute of R. To evaluate 
tta{R) we do not need to merge in all U-relations defining 
the attributes of R and later project on A. Instead, the an- 
swer is simply U. In the following, we assume that the 
input database is always reduced. As we will discuss next, 
our query evaluation technique always produces reduced U- 
relations for reduced input U-relational databases. 

Example 3.2. Consider the following non-reduced 
database of two U-relations: 





D 


T 


A 


u 2 


D 


T 


B 




Ci h-» 1 


h 


a\ 




C\ 1 — * 1 


h 


h 




C"2 1 — * 1 


h 


a 2 




C\ i — > 2 


h 


b 2 



In each U-relation the second tuple cannot find a partner in 
the other U-relation with which a complete tuple (with both 



mergein^R), 7rj_j(R)) = R, where A = sch(fl) ( 1 ) 

merge(R,S) — merge(S,R) (2) 

merge{merge(R,S), T) — merge(R, merge(S , T)) (3) 

°'<! > (X)( mer g e ( R > S )) = mergeicr^iR), S ) (4) 

where X c sch(/?) 

merge(R, S ) x^j) T = merge(R x 0(f Y) T, S ) (5) 

where XllYQ sch(fl) U sch(r) 

njimergeiR, S )) = merge(7i lnJ (R), n ln g(S )) (6) 

where sch(R) = A, sch(S) = ~B 

Figure 2. Algebraic equivalences for rela- 
tional algebra queries with merge operator. 

attributes A and B) can be formed. If these second tuples 
are removed, the database is reduced. □ 

We can always reduce a U-relational database as follows: 
We filter each U-relation using semijoins with each of the 
other U-relations representing data of the same relation Rj. 
The semijoin conditions are the a and i/'-conditions. 

Proposition 3.3. Given a schema X, there is a relational 
algebra query that reduces a U-relational database over 2. 

Algebraic equivalences. Figure [2] gives algebraic equiva- 
lences of relational algebra expressions with merge operator 
on vertical decompositions: Merging is the reverse of ver- 
tical partitioning, it is commutative and associative, it com- 
mutes with selections, joins, and projections. 

Standard heuristics known from classical query opti- 
mization for relational algebra apply here as well. Intu- 
itively, we usually push down projections and selections and 
merge in U-relations as late as possible. An interesting new 
case is the decision on join ordering among an explicit join 
from the input query and a join due to merging: If the merge 
is executed before the explicit join, it may reduce the size of 
an input relation to join. We have seen in our experiments 
that the standard selectivity-based cost measures employed 
by relational database management systems do a good job, 
as long as the queries remain reasonably small. 

Example 3.4. Consider a U-relational database 1A that 
represents a set of possible worlds over two TPC-H rela- 
tions Ord and Cust (short for Order and Customer, respec- 
tively) lfl6l . U has one U-relation for each attribute of the 
two relations, of which we only list DATE and CUSTKEY 
for Ord, and NAME and CUSTKEY for Cust. The follow- 
ing query finds all dates of orders placed by Al after 2003: 

7TDATE(0"NAME='Al'(CuSt) XCUSTKEY CT DATE>2 003(Ord)) 

Figure [3] shows three possible plans PI, P2, and P3 us- 
ing operators on vertical decompositions. The naive plan 
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Figure 3. Three equivalent query plans. 

PI first reconstructs Ord from its two partitions then ap- 
plies the selection and the join with Cust. In P2 and P3 the 
merge operator is pushed up in the plans, first immediately 
above the selection (P2), and then above the join operator 
(P3). Among the three plans, PI is clearly the least efficient. 
However, without statistics about the data, one cannot tell 
which of P2 and P3 should be preferred. If DATE>2003 is 
very selective, then merging immediately thereafter as in P2 
will lead to filtering of tuples from ^cusTKEY(Ord) and thus 
fewer tuples will be processed by the join. Is this not the 
case, then first merging only increases the number and size 
of the tuples that have to be processed by the join. Also, in 
P3 all value attributes except of DATE are projected away 
after the join as they are not needed for the final result. □ 

Queries on U-relations. Figure |4] gives the function [[-J 
that translates positive relational algebra queries with poss 
and merge operators into relational algebra queries on U- 
relational databases. 

The poss operator applied on a U-relation U closes the 
possible worlds semantics by computing the set of tuples 
possible in U. It thus translates to a simple projection on 
the value attributes of U. The result of a projection is a 
U-relation whose value attributes are those from the projec- 
tion list (thus the input ws-descriptors and tuple ids are pre- 
served). Selections apply conditions on the value attributes. 

The merge operator that reconstructs a relation from its 
vertical partitions was already explained. Similarly to the 
merge, the join uses the i/r-condition to discard tuple com- 
binations with inconsistent ws-descriptors. Figure [4] gives 
the translation in case U\ and U2 do not contain partitions 
of the same relation. For the case of self-joins we require 
aliases for the copies of the relation involved in it such that 
they do not have common tuple id attributes. 

The union of U\ and U2 like the ones from Figure [4] is 
sketched next. We assume that A\ = A2, T\ D T2 = 0, 
and the tuples of different relations have different ids. To 



Let Ui := IQiJ with schema [D u TuM], 
U 2 := IQ2J with schema [D 2 , T 2 ,A 2 ], 
a:= f\ (U 1 .T = U 2 .T), 
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Figure 4. Translation of queries with merge 
into queries on U-relations. 

bring U\ and U2 to the same schema, we first ensure ws- 
descriptors of the same size by pumping in the smaller ws- 
descriptors already contained variable assignments, and add 
new (empty) columns T2 to U\ and T\ to U2- We then per- 
form the standard union. 

From our translation J-J it immediately follows that 

Theorem 3.5. Positive relational algebra queries extended 
with the possible operator can be evaluated on U -relational 
databases using relational algebra only. 

Example 3.6. Recall the U-relational database of FigureQ] 
storing information about moving vehicles. Consider a 
query asking for ids of enemy tanks: 

S — Tld(o"Type='Tank'AFaction='Enemy'(^)) 

After merging the necessary partitions of relation R and 
translating it into positive relational algebra, we obtain 

^Id(o"Type='Tank'AFaction='Enemy'(^l >%A^i U2 X a2 Ail/ 2 U3)), 

where the conditions ijfi, fa, &\, and a-2 follow the trans- 
lation given in Figure [4] The three vertical partitions are 
joined on the tuple id attributes {a\ and 0-2) and the com- 
binations with conflicting mappings in the ws-descriptors 
are discarded (i/q and ife)- Before and after translation, the 
query is subject to optimizations as discussed earlier. (In 
this case, a good query plan would first apply the selections 
on the partitions, then project away the irrelevant value at- 
tributes Type and Faction, and then merge the partitions). 
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The above U-relation t/4 encodes the query answer. □ 

Example 3.7. We continue Example l3.6l and ask whether it 
is possible that the enemy has two tanks on the map, and if 
so, which vehicles are those. For this, we compute the pairs 
of enemy tanks as a self-join of S : (S si) x^.id^.id (S S2). 
This query is in turn equivalent to a self-join of t/4. 
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The answer is encoded by the above U-relation t/5. Note 
that the combinations of the first two tuples of t/4 are not in 
t/5, because they have inconsistent ws-descriptors and are 
filtered out using the i/'-condition (vehicle c cannot be at the 
same time at two different positions). To obtain the possible 
pairs of vehicle ids, we apply the poss operator on t/5. This 
is expressed as the projection on the value attributes of U$.a 

Our translation yields relational algebra queries, whose 
evaluation always produces tuple-level U-relations, i.e., U- 
relations without vertical decompositions, by joining and 
merging vertical partitions of relations. Following the defi- 
nition of the merge operator, if the input U-relations are re- 
duced, then the result of merging vertical partitions is also 
reduced. We thus have that 

Proposition 3.8. Given a positive relational algebra query 
Q and a reduced U-relational database U, [QM^O is a re- 
duced U -relational database. 

4 Normalization of U-relations 

U-relations do not forbid large ws-descriptors. The abil- 
ity to extend the size of ws-descriptors is what yields effi- 
cient query evaluation on U-relations. However, large ws- 
descriptors cause an inherent processing overhead. Also, 
after query evaluation or dependency chasing on a U- 
relational database, it may happen that tuple fields, which 
used to be dependent on each other, become independent. 
In such a case, it is desirable to optimize the world-set rep- 
resentation [6|. We next discuss one approach to normalize 
U-relational databases by reducing large ws-descriptors to 
ws-descriptors of size one. Normalization is an expensive 
operation per se, but it is not unrealistic to assume that un- 
certain data is initially in normal form [4, 6 | and can subse- 
quently be maintained in this form. 

Definition 4.1. A U-relational database is normalized if all 
ws-descriptors of its U-relations have size one. 

Algorithm Q] gives a normalization procedure for U- 
relations that determines classes of variables that co-occur 
in some ws-descriptors and replaces each such class by one 



Algorithm 1: Normalization of ws-descriptors. 

Input: Reduced U-relational database H = ({/,, . . . , U m , W) 
Output: Normalized reduced U-relational database, 
begin 

R := the relation consisting of all pairs of variables 
(Cj,Cj) that occur together in some ws-descriptor of 11; 
Q : = the graph whose node set is the set of variables and 
whose edge relation is the refl. and trans, closure of R; 
Compute the connected components of Q\ 
foreach U-relation Uj(D u D,„T,A) of U do 
Uj := empty U-relation over (/^(Var, Rng, T, A); 

foreach t e U do 

G, := connected component of Q with id i such 
that the nodes ?.Var, , . . . , t. Var„ are in G,; 
[c h , . . . , c it ) = G, - [t. Var, , . . . , t . Var„ } ; 
foreach : (c ii ,/,,)€ W, . . . , /,,. : (c, t , 
do 

/* Compute a new domain value (f\o t \ is 
either the identity or better, for atomic f s, 
an injective function int 10,1 — > int) */; 
/:=/|o,l(r.Rni, /„,.._, /,,_)_; 
L U'j := U'.\J{{G u l,t.T,tA)}; 

W := Uife, (h, ■ ■ . , D) I G, = {c, , . . . , c m j and 

(c,, /,),..., (c„„/„,) 6 W}; 

Output ([/J,..., U' m , W); 

end 



variable, whose domain becomes the product of the do- 
mains of the variables from that class. Figure [5] shows a 
U-relational database and its normalization. 

Theorem 4.2. Given a reduced U-relational database, Al- 
gorithm |7] computes a normalized reduced U-relational 
database that represents the same world-set. 

Computing certain answers. Given a set of possible 
worlds, we call a tuple certain iff it occurs in each of the 
worlds. It is known that the tuple certainty problem is 
coNP-hard for a number of representation systems, ranging 
from attribute-level ones like WSDs to tuple-level ones like 
ULDBs |6|. In case of tuple-level normalized U-relations, 
however, we can efficiently compute the certain tuples using 
relational algebra. 

Lemma 4.3. A tuple 7 is certain in a tuple-level normalized 
U-relation U iff there exists a variable x such that (x 1— > 
1, 1, f) e U for each domain value / of x and some tuple id s . 

The condition of the lemma can be encoded as the fol- 
lowing domain calculus expression: 

cert(U) := {7 | 3jcW (x, I) e W => 3s(x, 1,1,1) e U) 

The equivalent relational algebra query on a tuple-level nor- 
malized U-relational database (t/[Var, Rng, Tr, A], W) is 

7Tj(7TVar(W) X 7lj(U) - 7T VmJ (W X ItjtfJ) - 71 VarRngJ U)) . 
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(c) WSD corresponding to (b) 

Figure 5. Normalization example. 

5 Succinctness and Efficiency 

This section compares U-relational databases with 
WSDs [4, 6 1 andULDBs [8| using two yardsticks: succinct- 
ness, i.e., how compactly can they represent world-sets, and 
efficiency of query evaluation. 

WSDs vs. U-Relations. WSDs are essentially normal- 
ized U-relational databases where each variable c, of a U- 
relation corresponds to a WSD component relation C, and 
each domain value l, of c, corresponds to a tuple of Q. 
Figure [5jc) shows a WSD equivalent to a normalized U- 
relational database. The normalization may lead to an ex- 
ponential blow-up in the database size and accounts for U- 
relations with arbitrarily large ws-descriptors being more 
compact than U-relations with singleton ws-descriptors and 
thus than WSDs. 

Example 5.1. Consider a relation over schema R[AB] 
where each field value can be or 1, and f,.A and the tu- 
ple fields f ((+1) mo( j n .B depend on each other (1 < i < n). 
The encodings as WSD and as a set of two U-relations are 
given in Figure [6] □ 

Theorem 5.2. U-relational databases are exponentially 
more succinct than WSDs. 

Positive relational queries have polynomial data com- 
plexity for U-relations (Section [3} and exponential data 
complexity for WSDs J6|. This can be explained in close 
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(b) U-relational encoding. 

Figure 6. WSD and U-relational encoding of 
the world-set of Example ETTl 
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(b) U-relational encoding. 

Figure 7. WSD and U-relation representing 
the answer to o- A=B (R) with R of Figure m 



analogy to the difference in succinctness and by the fact 
that query evaluation creates new dependencies iflOl : U- 
relations can efficiently store the new dependencies by en- 
larging ws-descriptors, whereas WSDs correspond to U- 
relations with normalized ws-descriptors, hence the expo- 
nential blowup. 

Example 5.3. Consider the WSD and U-relations of Exam- 
ple |5.1| and the selection with join condition <x A=g (R). The 
answer is represented by the WSD and U-relation respec- 
tively shown in Figure|7] The U-relation U3 has 2 ■ n tuples, 
whereas the WSD c\ X • • • X c„ has 2" tuples, each repre- 
senting a possible combination of the values of the existing 
fields (a tuple f, does not occur in worlds where f,.A or f;.B 
have values _L). Note that by normalizing t/3 we would also 
obtain one variable with 2" domain values, as for the WSD. 

The answer to poss(cr A=B (R)) is efficiently computed as 
tia,b(U3) in the case of U-relations. In the WSD case, it is 

n 

computed as U(7r, ; .A,/,.B(ci x ■ ■ ■ x c„)). a 



1 



Finally, the query translations employed by the evalua- 
tion algorithms in the WSD and U-relational cases are dif- 
ferent. Whereas for WSDs all operators are translated to 
sequences of relational queries and in the case of projec- 
tion and join even to fixpoint programs [4], the translation 
remains strictly in relational algebra for U-relations. 

ULDBs vs. U-Relations. A ULDB relation is a set of 
x-tuples, where each x-tuple represents a set of alternatives. 
One world is defined by choosing precisely one alternative 
of each x-tuple. A world may contain none of the alterna- 
tives of an x-tuple, if this x-tuple is marked as optional (or 
maybe) using the ?-symbol. Dependencies between alter- 
natives of different x-tuples are enforced using lineage: An 
alternative i of an x-tuple s occurs in the same worlds with 
an alternative j of another x-tuple t if the lineage of (s, i) 
points either to (t, j), or to another alternative that transi- 
tively points to (f, j). The lineage of an alternative can also 
point to an external symbol (t, j), if there is no alternative 
(f, j) in the database (H). 

Example 5.4. The U-relations representing relation R in 
Figure Q] admit the following equivalent ULDB: 





R (Id, Type, Faction) 




a 


1: (1, Tank, Friend) 




b 


1: (2, Transport, Friend) || 2: (3, Transport, Friend) 


A 


c 


1: (3, Tank, Enemy) || 2: (2, Tank, Enemy) 




d 


1: (4, Tank, Friend) || 2: (4, Tank, Enemy) || 
3: (4, Transport, Friend) || 4: (4, Transport, Enemy) 





A is A(b, 1) = {(c, 1)1, A(b, 2) = j(c, 2)( 

To construct an ULDB equivalent to the U-relational 
database of Figure [T] we have to enumerate all possible 
value combinations for the attributes of R. This enumera- 
tion is not necessary for U-relations because of vertical par- 
titioning and the independence of (most) tuple fields. □ 

Lemma 5.5. ULDBs [8] can be translated linearly into U- 
relational databases. 

Proof. We sketch the proof for a single ULDB relation R; it 
can be extended trivially to the case of several relations. 

For every x-tuple t in R we create a new variable c t , and 
for each alternative j of t we create a new domain value 
W( t ,j) of c,. For every alternative in R with value a, id (t, j) 
"«j> 

and lineage A(t, j) = A, (f,-, jf) we create a tuple in U with 

i 

value a, tuple id t and ws-descriptor (n = n(t,j)) 

D( t ,j) = [fe, w (lJ) ), (c tl ,w (t!jl) ), (c tn ,W(t n ,j„))]. 
In case n^,j) is smaller than n( S D of an alternative I of an x- 



Q\\ possible (select o.orderkey, o.orderdate, o.shippriority from 
customer c, orders o, lineitem 1 where c.mktsegment = 'BUILDING' 
and c.custkey = o.custkey and o.orderkey = l.orderkey 
and o.orderdate > ' 1995-03-15' and l.shipdate < ' 1995-03-17') 

Ql : possible (select extendedprice from lineitem where 

shipdate between ' 1994-01-01' and '1996-01-01' 

and discount between '0.05' and '0.08' and quantity < 24) 

Qy. possible (select nl.name, n2.name from supplier s, lineitem 1, 
orders o, customer c, nation nl, nation n2 where n2.nation='IRAQ' 
and nl.nation='GERMANY' and c.nationkey = n2.nationkey 
and s.suppkey = l.suppkey and o.orderkey = l.orderkey 
and c.custkey = o.custkey and s.nationkey = nl.nationkey) 



W) 1 



tuple s, then we pad the above ws-descriptor with «(s,/)-«(rj) 
pairs (c, w {tJ) ). 



Figure 8. Queries used in the experiments. 

The world table W is the set of pairs of variables and do- 
main values created for the x-tuples of R. For each optional 
x-tuple t in R, we also add to W a tuple (c,, w) where w is a 
fresh domain value for c t . □ 



There are U-relations, however, whose ULDB encod- 
ings are necessarily exponential in the arity of the logi- 
cal relation. This is the case of, e.g., or-set relations fl3l . 
attribute-level representations that can be linearly encoded 
as U-relations but exponentially as ULDBs. 

Theorem 5.6. U-relational databases are exponentially 
more succinct than ULDBs. 

Both ULDBs and U-relations have polynomial data com- 
plexity for positive relational queries. Differently from 
ULDBs, evaluating queries on U-relations is possible us- 
ing relational algebra only. The main difference between 
their evaluation algorithms concerns erroneous tuples, i.e., 
tuples that do not appear in any world. In contrast to U- 
relations, erroneous tuples may appear in the answers to 
queries on ULDBs (see [8 1 for an example). The removal of 
such tuples is called data minimization, an expensive oper- 
ation that involves the computation of the transitive closure 
of lineage [8|. Such tuples occur with ULDBs because the 
lineage of an alternative in the answer only points to the lin- 
eage of alternatives from the input relations, even though 
these input alternatives may not occur in the same world. 
This cannot happen with U-relations because each query 
operation ensures that only valid tuples are in the query 
answer by (1) using the (^-condition in the join and merge 
operations and by (2) carrying all dependencies in the ws- 
descriptors - and not only to tuples of the input relation. 

To sum up, U-relations have the advantages of WSDs 
(attribute-level representation) and ULDBs (polynomial 
evaluation of positive relational algebra queries), while 
forming an exponentially more succinct representation sys- 
tem than both aforementioned approaches. 
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Figure 9. Total number of worlds, max. number of local worlds in a component, and size in MB of the 
U-relational database for each of our settings. 



6 Experiments 

Prototype Implementation. We implemented the query 
translator of Figure [4] and also extended the C implemen- 
tation of the TPC-H population generator version 2.6 build 
1 1 16 1 to generate attribute and tuple-level U-relations and 
ULDBs. The code is available on the MayBMS project page 
(http://www.infosys.uni-sb.de/projects/maybms). 
Setup. The experiments were performed on a 3GHZ/1GB 
Pentium running Linux 2.6.13 and PostgreSQL 8.2.3. 
Generation of uncertain data. Our data generator creates 
eight tables: part, partsupp, supplier, customer, lineitem, or- 
ders, nation, region. The field values are sensitive to the at- 
tribute types and are randomly generated or randomly cho- 
sen from the dictionary explained in the TPC-H benchmark 
specification. The following parameters were used to tune 
the generation: scale (s), uncertainty ratio (x), correlation 
ratio (z), and maximum alternatives per field (m). The (db- 
gen standard) parameter s is used to control the size of each 
world; x controls the percentage of (uncertain) fields with 
several possible values, and m controls how many possible 
values can be assigned to a field. The parameter z defines a 
Zipf distribution for the variables with different dependent 
field counts^ (DFC) and controls the attribute correlations: 
For n uncertain fields, there are \C * z'~\ variables with DFC 

i, where C = n(z - l)/(z k+1 - 1), i.e., n = E(C * z% The 

r'=0 

number of domain values of a variable with DFC k > 1 

is chosen using the formula p k ~ l * n(m,), where m, is the 

number of different values for the field i dependent on that 
variable and p is the probability that a combination of pos- 
sible values for the k fields occurs. This assumption fits nat- 
urally to data cleaning scenarios. Previous work [4] shows 

3 This is the number of tuple fields dependent on that variable. 



that chasing dependencies on WSDs enforces correlations 
between field values and removes combinations that violate 
the dependencies. We considered here that after correlating 
two variables with arbitrary DFCs, 100(1 —p) percent of the 
combinations violate constraints and thus are dropped. 

The uncertain fields are assigned randomly to variables. 
This can lead to correlations between fields belonging to 
different tuples or even to different relations. This fits to 
scenarios where constraints are enforced across tuples or 
relations. We do not assume any kind of independence of 
our initial data as done in several other approaches |[T0l [8l. 

Our data generator works as follows. While generating 
tuples for the eight tables, we use the uncertainty ratio to 
decide at each tuple field if it is uncertain or not. We col- 
lect in a field pool the coordinates (i.e., relation, tuple id, 
attribute) of the uncertain tuple fields and when the orig- 
inal TPC-H generator finishes its job or the field pool is 
full, we shuffle the uncertain tuple fields, compute the cor- 
relation ratio for variables with different DFC, and incre- 
mentally assign tuple fields to variables. Then, we compute 
the domain size of each variable, and the number of dif- 
ferent values for each of variable's fields. The field values 
are then generated using the data distribution and dictio- 
nary for that field type, as specified by the original TPC-H 
generator. Because there can be too many field coordinates 
to keep in memory at a time, we use in our experiments a 
window of 10 million fields to be processed in bullfl; af- 
ter a window is processed, the memory is released, and a 
new window is filled in and processed. The window size 
influences the number and dependent field count of the vari- 
ables. For the experiments, we fixed p to 0.25, m to 8, and 
varied the remaining parameters as follows: s ranges over 



4 It corresponds to a maximum of 500 MB of main memory allocated 
for dbgen on our testing machine. 
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Figure 10. Query plan for Q x using merge. 

(0.01, 0.05, 0.1, 0.5, l),z ranges over (0.1,0.25,0.5), and x 
ranges over (0.001, 0.01, 0.1). 

An important property of our generator is that any world 
in a U-relational database shares the properties of the one- 
world database generated by the original dbgen: The sizes 
of relations are the same and the join selectivities are ap- 
proximately equal. We checked this by randomly choosing 
one world of the U-relational database and comparing the 
selectivities of joins on the keys of the TPC-H relations for 
different scale factors and uncertainty ratios. 
Queries. We used the three queries from Figure [8] Query 
Q\ is a join of three relations of large sizes. Query £>2 is 
a select-project query on the relation lineitem (the largest 
in our settings). Query Q$ is a fairly complex query that 
involves joins between six relations. All queries use the 
operator 'possible' to retrieve the set of matches across all 
worlds. Note that these queries are modified versions of Q3, 
Qe, and Qj of TPC-H where all aggregations are dropped 
(dealing with aggregation is subject to future work). 

Figure QT| shows that our queries are moderately selec- 
tive and their answer sizes increase with uncertainty x and 
marginally with correlation z. For scale 1, the answer sizes 
range from tens of thousands to tens of millions of tuples. 
There is only one setting (z = 0.25 and x = 0.1) where one 
of our queries, Q3, has an empty answer. Before the execu- 
tion, the queries were optimized using our U-relation-aware 
optimizations. Figure [TOlshows Q\ after optimizations. 
Characteristics of U-relations. Following Figure [9] the 
U-relational databases are exponentially more succinct than 
databases representing all worlds individually: while the 
number of worlds increases exponentially (when varying 
the uncertainty ratio x), the database size increases only lin- 
early. The case of x = corresponds to one world generated 
using the original dbgen. Interestingly, to represent 10 8 ' 10 
worlds, the U-relational database needs about 6.7 times the 
size of one world. 

An increase of the scaling factor leads to an exponen- 
tial increase in the number of worlds and only to a linear 
increase in the size of the U-relational database. The max- 
imum domain size of a variable is indirectly influenced by 
s: When s increases, there are more uncertain fields and 



thus more likely to obtain variables with more dependent 
fields. By our construction, the domain size of variables 
with higher DFC can be much larger than the maximum do- 
main size of variables with DFC=1 (which is m — 8). This 
is because a variable with DFC-k has a fraction (p = 0.25) 
of the product of the domain values of k variables taken to- 
gether. As shown in Figure [9] our settings have variables 
with domain sizes of up to 3392. Although we only re- 
port here on experiments with scale factors up to 1, fur- 
ther experiments confirmed that similar characteristics are 
obtained for larger scales, too. An increase of the correla- 
tion parameter leads to a moderate relative increase in the 
database size. When compared to one-world databases, the 
sizes of U-relational databases have increase factors that 
vary from 6.2 (for z = 0.1) to 8.2 (forz = 0.5). 

Merge Join (cost=3187724. 24. .434887461 .47 rows=14175759582 width=18) 
Merge Cond: (u_l_ quantity. tid = u_l_extendedprice . tid) 
loin Filter: (((u_l_quantity . cl <> u_l_extendedprice.cl) OR ( 
u_l_quantity.wl - u_l_extendedprice.wl)) AND 
((u_l_extendedprice.cl <> u_l_discount.cl) OR 
(u_l_extendedprice .wl = u_l_discount . wl)) AND 

((u_l_extendedprice .cl <> u_l_shipdate.cl) OR (u_l_extendedprice .wl = u_l_shipdate . wl))) 
-> Merge Join (cost=1381116. 36. . 7243281.93 rows=224865665 width=79) 

Merge Cond: (u_l_shipdate . tid = u_l_quantity .tid) 

Join Filter: (((u_l_quantity.cl <> u_l_shipdate. cl) OR 

(u_l_quantity.wl = u_l_shipdate . wl)) AND ((u_l_quantity .cl o u_l_discount.cl) 
OR (u_l_quantity.wl = u_l_discount.wl))) 
-> Merge Join (cost=818344. 64. . 1826829. 84 rows=18658797 width=55) 

Merge Cond: (u_l_discount.tid = u_l_shipdate.tid) 

Join Filter: ((u_l_shipdate.cl <> u_l_discount . cl) OR 

(u_l_shipdate .wl = u_l_discount.wl)) 

-> Sort (cost=269775. 78. .271512.42 rows=694689 width=31) 
Sort Key: u_l_discount.tid 

-> Seq Scan on u_l_discount (cost=8. 88. . 164374.88 rows=694689 width=31) 
Filter: (Cl_discount > '6.65') AND (l_discount < '6.88')) 
-> Sort (cost=548568. 94. .545791.18 rows=2888896 width=24) 
Sort Key: u_l_shipdate.tid 

-> Seq Scan on u_l_shipdate (cost=8. 88. . 171354.29 rows=2888896 width=24) 
Filter: C(l_shipdate > '1994-81-81') AND (l_shipdate < '1996-81-81')) 
-> Sort (cost=578771. 73. .576676.98 rows=23621Sl width=24) 
Sort Key: u_l_quantity.tid 

-> Seq Scan on u_l_quantity (cost=8. 88. . 151169.98 rows=2362181 width=24) 
Filter: (l_quantity < '24') 
-> Sort (cost=1886687. 87. . 1824248.68 rows=7853122 width=35) 
Sort Key: u_l_extendedprice.tid 

-> Seq Scan on u_l_extendedprice (cost=8. 88. . 136447. 22 rows=7853122 width=35) 

Figure 13. Query plan for Q 2 (s = 1, x = 0.1, z = 
0.1), as generated by PostgreSQL. 

Query Evaluation on U-relations. We run four times our 
set of three queries on the 45 different datasets reported in 
Figure [9] For each query and correlation ratio, Figure Q~2] 
has a log-log scale diagram showing the median evaluation 
(including storage) time in seconds as a function of the scale 
and uncertainty parameters . The different lines in each of 
the diagrams correspond to different uncertainty ratios. 

Figure Q~2] shows that the evaluation of our queries is 
efficient and scalable. In our largest scenario, where the 
database has size 13 GB and represents 10 81 ° 6 worlds with 
1 .4 GBs each world, query £>3 involving five joins is eval- 
uated in less than two and a half minutes. One explanation 
for the good performance is the use of attribute-level repre- 
sentation. This allows to first compute the joins locally us- 
ing only the join attributes and later merge in the remaining 
attributes of interest. Another important reason for the effi- 
ciency is that due to the simplicity of our rewritings, Post- 
greSQL optimizes the queries in a fairly good way. Fig- 
ure[T3]shows an optimized query plan produced by the Post- 
greSQL 'explain' statement for the rewriting of Q2. 
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Figure 12. Performance of query evaluation 

The evaluation time varies linearly with all of our pa- 
rameters. For Qi (Qi and Qj respectively) we witnessed 
a factor of up to 6 (4 and 10 respectively) in the evalua- 
tion time when varying the uncertainty ratio from 0.001 to 
0.1. When the correlation ratio is varied from 0.1 to 0.5, 
the evaluation time increases by a factor of up to 3; this is 
also explained by the increase in the input and answer sizes, 
cf. Figures [9] and [TT| When the scale parameter is varied 
from 0.01 to 1, the evaluation time increases by a factor of 
up to 400; in case of Qi and z = 0.5, we also noticed some 
outliers where the increase factor is around 1000. The con- 
siderably smaller evaluation time for Qt, in case of scale 1, 
uncertainty 0.1, and correlation 0.25 occurs because for that 
scenario no 'GERMANY' entry is generated for the nation 
table, thus the query answer is empty. 



various scale, uncertainty, and correlation. 

Effect of attribute-level representation. We also per- 
formed query evaluation on tuple-level U-relations, which 
represent the same world-set as the attribute-level U- 
relations of Figure|9] and on Trio's ULDBs (SI obtained by a 
(rather direct) mapping from the tuple-level U-relations. To 
date, Trio has no native support for the poss operator or the 
removal of erroneous tuples in the query answer, though this 
effect can be obtained as part of the confidence computa- 
tiorfl For that reason, we decided to compare the evaluation 
times of queries without the poss operator and without the 
(expensive) removal of erroneous tuples or confidence com- 
putation (which is an exponential-time problem). Since our 
data exhibits a high degree of (randomly generated) depen- 
dency, its ULDB representation has lineage and thus join 

'Personal communication with the TRIO team as of June 2007. 
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queries can introduce erroneous tuples in the answer. The 
Trio prototype was set to use the (faster) SPI interface of 
PostgreSQL (and not its default python implementation). 
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Figure 14. Querying attribute- and tuple-level 
U-relations in MayBMS and ULDBs in Trio. 

Figure[14]compares the evaluation time on attribute- and 
tuple-level U-relations in MayBMS, and ULDBs for small 
scenarios of 1% uncertainty, our lowest correlation factor 
0.1, and scale up to 0.1. On attribute-level U-relations, the 
queries perform several times better than on tuple-level U- 
relations and by an order of magnitude better than ULDBs. 
This is because attribute-level data allows for late material- 
ization: selections and joins can be performed locally and 
tuple reconstruction is done only for successful tuples. We 
witnessed that an increase in any of our parameters would 
create prohibitively large (exponential in the arity) tuple- 
level representations. For example, for scale 0.01 and un- 
certainty 10%, relation lineitem contains more than 15M 
tuples compared to 80K in each of its vertical partitions. 

7 Conclusion and Future Work 

This paper introduces U-relational databases, a simple 
representation system for uncertain data that combines the 
advantages of existing systems, like ULDBs and WSDs, 
without sharing their drawbacks. U-relations are exponen- 
tially more succinct than both WSDs and ULDBs. Positive 
relational algebra queries are evaluated purely relationally 
on U-relations, a property not shared by any other previous 
succinct representation system. Also, U-relations are a sim- 
ple formalism which poses a small burden on implementors. 

We next briefly report on two current research directions. 
Probabilistic U-relations. U-relational databases can be 
elegantly extended to model probabilistic information by 
just adding a probability column P to the world table W. 
For each variable x, the sum of the values np(o-y- dr=x )(W) 
must equal one. We can then assign probability to any sub- 
set of the world-set, described by a ws-descriptor d, as the 
product of probabilities of each variable assignment in d. 

The techniques for evaluating the operations of positive 
relational algebra presented in this paper are applicable in 
the probabilistic case without changes. Computing the con- 
fidences of the answer tuples is an inherently hard problem 
IfTUl . Our current research investigates practical approxima- 
tion techniques for confidence computation. 



Support for new language constructs. Following our re- 
cent investigation on uncertainty -aware language constructs 
beyond relational algebra \5\, we identified common phys- 
ical operators needed to implement many primitives for 
the creation and grouping of worlds. It appears that nor- 
malizing sets of ws-descriptors in the sense of Section [4] 
plays an important role in evaluating these operations and 
in confidence computation. We are currently working on 
secondary-storage algorithms for normalization. 
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